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(54) Production and use of normalized DNA libraries 



(57) Disclosed is a process for forming a normalized 
genomic DNA library from an environmental sample by 
(a) isolating a genomic DNA population from the envi- 
ronmental sample; (b) analyzing the complexity of the 
genomic DNA population so isolated; (c) at least one of 
(i) amplifying the copy number of the DNA population so 
isolated and (ii) recovering a fraction of the isolated ge- 



nomic DNA having a desired characteristic; and (d) nor- 
malizing the representation of various DNAs within the 
genomic DNA population so as to form a normalized li- 
brary of genomic DNA from the environmental sample. 
Also disclosed is a normalized genomic DNA library 
formed from an environmental sample by the process. 
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Description 

[0001] The present invention relates to the field of production and screening of gene libraries, and more particularly 
to the generation and screening of normalized genomic DNA libraries from mixed populations of microbes and/or other 
5 organisms. 

BACKGROUND OF THE INVENTION 

[0002] There has been increasing demand in the research reagent, diagnostic reagent and chemical process indus- 
10 tries for protein-based catalysts possessing novel capabilities. At present, this need is largely addressed using enzymes 
purified from a variety of cultivated bacteria or fungi. However, because less than 1% of naturally occurring microbes 
can be grown in pure culture (Amann, 1995), alternative techniques must be developed to exploit the full breadth of 
microbial diversity for potentially valuable new products. 

[0003] Virtually all of the commercial enzymes now in use have come from cultured organisms. Most of these organ- 
15 isms are bacteria or fungi. Amann et al. (Amann, 1995) have estimated cultivated microorganisms in the environment 
as follows: 



20 



Habitat 


Culturability (%) 


Seawater 


0.001-0.1 


Freshwater 


0.25 


Mesotrophic lake 


0.01-1.0 


Unpolluted esturine waters 


0.1-3.0 


Activated sludge 


1.0-15.0 


Sediments 


0.25 


Soil 


0.3 



[0004] These data were determined from published information regarding the number of cultivated microorganisms 
derived from the various habitats indicated. 

[0005] Other studies have also demonstrated that cultivated organisms comprise only a small fraction of the biomass 
present in the environment. For example, one group of workers recently reported the collection of water and sediment 
samples from the "Obsidian Pool" in Yellowstone National Park (Barns, 1994) where they found cells hybridizing to 
archaea-specific probes in 55% of 75 enrichment cultures. Amplification and cloning of 1 6S rRNA encoding sequences 
revealed mostly unique sequences with little or no representation of the organisms which had previously been cultured 
from this pool, suggesting the existence of substantial diversity of archaea with so far unknown morphological, phys- 
iological and biochemical features. Another group performed similar studies on the cyanobacterial mat of Octopus 
Spring in Yellowstone Park and came to the same conclusion; namely, tremendous uncultured diversity exists (Ward, 
1990). Giovannoni et al. (1990) and Torsvik era/. (1990a) have reported similar results using bacterioplankton collected 
in the Sargasso Sea and in soil samples, respectively. These results indicate that the exclusive use of cultured organ- 
isms in screening for useful enzymatic or other bioactivities severely limits the sampling of the potential diversity in 
existence. 

[0006] Screening of gene libraries from cultured samples has already proven valuable. It has recently been made 
clear, however, that the use of only cultured organisms for library generation limits access to the diversity of nature. 
The uncultivated organisms present in the environment, and/or enzymes or other bioactivities derived thereof, may be 
useful in industrial processes. The cultivation of each organism represented in any given environmental sample would 
require significant time and effort. It has been estimated that in a rich sample of soil, more than 1 0,000 different species 
can be present. It is apparent that attempting to individually cultivate each of these species would be a cumbersome 
task. Therefore, novel methods of efficiently accessing the diversity present in the environment are highly desirable. 

SUMMARY OF THE INVENTION 

[0007] The present invention addresses this need by providing methods to isolate the DNA from a variety of sources, 
including isolated organisms, consortias of microorganisms, primary enrichments, and environmental samples, to make 
libraries which have been "normalized" in their representation of the genome populations in the original samples, and 
to screen these libraries for enzyme and other bioactivities. 

[0008] The present invention represents a novel, recombinant approach to generate and screen DNA libraries con- 
structed from mixed microbial populations of cultivated or, preferably, uncultivated (or "environmental") samples. In 
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accordance with the present invention, libraries with equivalent representation of genomes from microbes that can 
differ vastly in abundance in natural populations are generated and screened. This "normalization" approach reduces 
the redundancy of clones from abundant species and increases the representation of clones from rare species. These 
normalized libraries allow for greater screening efficiency resulting in the isolation of genes encoding novel biological 
5 catalysts. 

[0009] Screening of mixed populations of organisms has been made a rational approach because of the availability 
of techniques described herein, whereas previously attempts at screening of mixed population were not feasible and 
were avoided because of the cumbersome procedures required. 

[0010] Thus, in one aspect the invention provides a process for forming a normalized genomic DNA library from an 
10 environmental sample by (a) isolating a genomic DNA population from the environmental sample; (b) analyzing the 
complexity of the genomic DNA population so isolated; (c) at least one of (i) amplifying the copy number of the DNA 
population so isolated and (ii) recovering a fraction of the isolated genomic DNA having a desired characteristic; and 
(d) normalizing the representation of various DNAs within the genomic DNA population so as to form a normalized 
library of-genomic DNA from the environmental sample. 
15 [0011] In one preferred embodiment of this aspect, the process comprises the step of recovering a fraction of the 
isolated genomic DNA having a desired characteristic. 

[0012] In another preferred embodiment of this aspect, the process comprises the step of amplifying the copy number 
of the DNA population so isolated. 

[0013] In another preferred embodiment of this aspect, the step of amplifying the genomic DNA precedes the nor- 
20 malizing step. In an alternate preferred embodiment of this aspect, the step of normalizing the genomic DNA precedes 
the amplifying step. 

[0014] In another preferred embodiment of this aspect, the process comprises both the steps of (i) amplifying the 
copy number of the DNA population so isolated and (ii) recovering a fraction of the isolated genomic DNA having a 
desired characteristic. 

25 [0015] Another aspect of the invention provides a normalized genomic DNA library formed from from an environ- 
mental sample by a process comprising the steps of (a) isolating a genomic DNA population from the environmental 
sample; (b) analyzing the complexity of the genomic DNA population so isolated; (c) at least one of (i) amplifying the 
copy number of the DNA population so isolated and (ii) recovering a fraction of the isolated genomic DNA having a 
desired characteristic; and (d) normalizing the representation of various DNAs within the genomic DNA population so 

30 as to form a normalized library of genomic DNA from the environmental sample. The various preferred embodiments 
described with respect to the above method aspect of the invention are likewise applicable with regard to this aspect 
of the invention. 

[001 6] The invention also provides a process for forming a normalized genomic DNA library from an environmental 
sample by (a) isolating a genomic DNA population from the environmental sample; (b) analyzing the complexity of the 
35 genomic DNA population so isolated; (c) at least one of (i) amplifying the copy number of the DNA population so isolated 
and (ii) recovering a fraction of the isolated genomic DNA having a desired characteristic; and (d) normalizing the 
representation of various DNAs within the genomic DNA population so as to form a normalized library of genomic DNA 
from the environmental sample. 

Another aspect of the invention provides a normalized genpmic DNA library formed from from an environmental sample 
40 by a process comprising the steps of (a) isolating a genomic DNA population from the environmental sample; (b) 
analyzing the complexity of the genomic DNA population so isolated; (c) at least one of (i) amplifying the copy number 
of the DNA population so isolated and (ii) recovering a fraction of the isolated genomic DNA having a desired charac- 
teristic; and (d) normalizing the representation of various DNAs within the genomic DNA population so as to form a 
normalized library of genomic DNA from the environmental sample. The various preferred embodiments described 
45 with respect to the above method aspect of the invention are likewise applicable with regard to this aspect of the 
invention. 

BRIEF DESCRIPTION OF THE DRAWING 

so [0017] Figure 1 is a graph showing the percent of total DNA content represented by G + C in the various genomic 
DNA isolates tested as described in Example 2. 

DETAILED DESCRIPTION OF THE INVENTION 

55 DNA ISOLATION: 

[0018] An important step in the generation of a normalized DNA library from an environmental sample is the prepa- 
ration of nucleic acid from the sample. DNA can be isolated from samples using various techniques well known in the 
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art (Nucleic Acids in the Environment Methods & Applications, J.T. Trevors, D.D. van Elsas, Springer Laboratory, 1 995). 
Preferably, DNA obtained will be of large size and free of enzyme inhibitors and other contaminants. DNA can be 
isolated directly from the environmental sample (direct lysis) or cells may be harvested from the sample prior to DNA 
recovery (cell separation). Direct lysis procedures have several advantages over protocols based on cell separation. 

5 The direct lysis technique provides more DNA with a generally higher representation of the microbial community, how- 
ever, it is sometimes smaller in size and more likely to contain enzyme inhibitors than DNA recovered using the cell 
separation technique. Very useful direct lysis techniques have recently been described which provide DNA of high 
molecular weight and high purity (Barns, 1994; Holben, 1994). If inhibitors are present, there are several protocols 
which utilize cell isolation which can be employed (Holben, 1994). Additionally, a fractionation technique, such as the 

10 bis-benzimide separation (cesium chloride isolation) described below, can be used to enhance the purity of the DNA. 

ANALYSIS OF COMPLEXITY: 

[0019] Analysis of the complexity of the nucleic acid recovered from the environmental samples can be important to 
15 monitor during the isolation and normalization processes. 16S rRNA analysis is one technique that can be used to 
analyze the complexity of the DNA recovered from environmental samples (Reysenbach, 1992; DeLong, 1992; Barns, 
1994). Primers have been described for the specific amplification of 16S rRNA genes from each of the three described 
domains. 

20 FRACTIONATION: 

[0020] Fractionation of the DNA samples prior to normalization increases the chances of cloning DNA from minor 
species from the pool of organisms sampled. In the present invention, DNA is preferably fractionated using a density 
centrifugation technique. One example of such a technique is a cesium-chloride gradient. Preferably, the technique is 
performed in the presence of a nucleic acid intercalating agent which will bind regions of the DNA and cause a change 
in the buoyant density of the nucleic acid. More preferably, the nucleic acid intercalating agent is a dye, such as bis- 
benzimide which will preferentially bind regions of DNA (AT in the case of bis-benzimide) (Muller, 1975; Manuelidis, 
1977). When nucleic acid complexed with an intercalating agent, such as bis-benzimide, is separated in an appropriate 
cesium-chloride gradient, the nucleic acid is fractionated. If the intercalating agent preferentially binds regions of the 
DNA, such as GC or AT regions, the nucleic acid is separated based on relative base content in the DNA. Nucleic acid 
from multiple organisms can be separated in this manner. 

[0021] Density gradients are currently employed to fractionate nucleic acids. For example, the use of bis-benzimide 
density gradients for the separation of microbial nucleic acids for use in soil typing and bioremediation has been de- 
scribed. In these experiments, one evaluates the relative abundance of A 260 peaks within fixed benzimide gradients 
before and after remediation treatment to see how the bacterial populations have been affected. The technique relies 
on the premise that on the average, the GC content of a species is relatively consistent. This technique is applied in 
the present invention to fractionate complex mixtures of genomes. The nucleic acids derived from a sample are sub- 
jected to ultracentrifugation and fractionated while measuring the A2 60 as in the published procedures. 
[0022] In one aspect of the present invention, equal A 260 units are removed from each peak, the nucleic acid is 
amplified using a variety of amplification protocols known in the art, including those described hereafter, and gene 
libraries are prepared. Alternatively, equal A 260 units are removed from each peak, and gene libraries are prepared 
directly from this nucleic acid. Thus, gene libraries are prepared from a combination of equal amounts of DNA from 
each peak. This strategy enables access to genes from minority organisms within environmental samples and enrich- 
ments, whose genomes may not be represented or may even be lost, due to the fact that the organisms are present 
in such minor quantity, if a library was construed from the total unfractionated DNA sample. Alternatively, DNA can be 
normalized subsequent to fractionation, using techniques described hereafter. DNA libraries can then be generated 
from this fractionated/normalized DNA. 

[0023] The composition of multiple fractions of the fractionated nucleic acid can be determined using PCR related 
amplification methods of classification well known in the art. 

NORMALIZATION: 

[0024] Previous normalization protocols have been designed for constructing normalized cDNA libraries (WO 
95/08647, WO 95/1 1 986). These protocols were originally developed for the cloning and isolation of rare cDNA's derived 
from mRNA. The present invention relates to the generation of normalized genomic DNA gene libraries from uncultured 
or environmental samples. 

[0025] Nucleic acid samples isolated directly from environmental samples or from primary enrichment cultures will 
typically contain genomes from a large number of microorganisms. These complex communities of organisms can be 
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described by the absolute number of species present within a population and by the relative abundance of each or- 
ganisms within the sample. Total normalization of each organisms within a sample is very difficult to achieve. Separation 
techniques such as optical tweezers can be used to pick morphologically distinct members with a sample. Cells from 
each member can then be combined in equal numbers or pure cultures of each member within a sample can be 
5 prepared and equal numbers of cells from each pure culture combined to achieve normalization. In practice, this is 
very difficult to perform, especially in a high thru-put manner. 

[0026] The present invention involves the use of techniques to approach normalization of the genomes present within 
an environmental sample, generating a DNA library from the normalized nucleic acid, and screening the library for an 
activity of interest. 

10 [0027] In one aspect of the present invention, DNA is isolated from the sample and fractionated. The strands of 
nucleic acid are then melted and allowed to selectively reanneal under fixed conditions (C 0 t driven hybridization). 
Alternatively, DNA is not fractionated prior to this melting process. When a mixture of nucleic acid fragments is melted 
and allowed to reanneal under stringent conditions, the common sequences find their complementary strands faster 
than the rare sequences. After an optional single-stranded nucleic acid isolation step, single-stranded nucleic acid, 

15 representing an enrichment of rare sequences, is amplified and used to generate gene libraries. This procedure leads 
to the amplification of rare or low abundance nucleic acid molecules. These molecules are then used to generate a 
library. While all DNA will be recovered, the identification of the organism originally containing the DNA may be lost. 
This method offers the ability to recover DNA from "unclonable sources." 

[0028] Nucleic acid samples derived using the previously described technique are amplified to complete the normal- 
20 ization process. For example, samples can be amplified using PCR amplification protocols such as those described 
by Ko etal. (Ko, 1990b; Ko, 1990a, Takahashi, 1994), or more preferably, long PCR protocols such as those described 
by Barnes (1994) or Cheng (1994). 

[0029] Normalization can be performed directly, or steps can also be taken to reduce the complexity of the nucleic 
acid pools prior to the normalization process. Such reduction in complexity can be beneficial in recovering nucleic acid 

25 from the poorly represented organisms. 

[0030] The microorganisms from which the libraries may be prepared include prokaryotic microorganisms, such as 
Eubacteria and Archaebacteria, and lower eukaryotic microorganisms such as fungi, some algae and protozoa. The 
microorganisms may be cultured microorganisms or uncultured microorganisms obtained from environmental samples 
and such microorganisms may be extremophiles, such as thermophiles, hyperthermophiles, psychrophiles, psychro- 

30 trophs, ere. 

[0031] As indicated above, the library may be produced from environmental samples in which case DNA may be 
recovered without culturing of an organism or the DNA may be recovered from a cultured organism. 
[0032] Sources of microorganism DNA as a starting material library from which target DNA is obtained are particularly 
contemplated to include environmental samples, such as microbial samples obtained from Arctic and Antarctic ice, 

35 water or permafrost sources, materials of volcanic origin, materials from soil or plant sources in tropical areas, etc. 
Thus, for example, genomic DNA may be recovered from either a culturable or non-culturable organism and employed 
to produce an appropriate recombinant expression library for subsequent determination of enzyme activity. 
[0033] Bacteria and many eukaryotes have a coordinated mechanism for regulating genes whose products are in- 
volved in related processes. The genes are clustered, in structures referred to as "gene clusters," on a single chromo- 

40 some and are transcribed together under the control of a single regulatory sequence, including a single promoter which 
initiates transcription of the entire cluster. The gene cluster, the promoter, and additional sequences that function in 
regulation altogether are referred to as an "operon" and can include up to 20 or more genes, usually from 2 to 6 genes. 
Thus, a gene cluster is a group of adjacent genes that are either identical or related, usually as to their function. 
[0034] Some gene families consist of identical members. Clustering is a prerequisite for maintaining identity between 

45 genes, although clustered genes are not necessarily identical. Gene clusters range from extremes where a duplication 
is generated to adjacent related genes to cases where hundreds of identical genes lie in a tandem array. Sometimes 
no significance is discernable in a repetition of a particular gene. A principal example of this is the expressed duplicate 
insulin genes in some species, ) whereas a single insulin gene is adequate in other mammalian species. 
[0035] It is important to further research gene clusters and the extent to which the full length of the cluster is necessary 

50 for the expression of the proteins resulting therefrom. Further, gene clusters undergo continual reorganization and, 
thus, the ability to create heterogeneous libraries of gene clusters from, for example, bacterial or other prokaryote 
sources is valuable in determining sources of novel proteins, particularly including enzymes such as, for example, the 
polyketide synthases that are responsible for the synthesis of polyketides having a vast array of useful activities. Other 
types of proteins that are the product(s) of gene clusters are also contemplated, including, for example, antibiotics, 

55 antivirals, antitumor agents and regulatory proteins, such as insulin. 

[0036] Polyketides are molecules which are an extremely rich source of bioactivities, including antibiotics (such as 
tetracyclines and erythromycin), anti-cancer agents (daunomycin), immunosuppressants (FK506 and rapamycin), and 
veterinary products (monensin). Many polyketides (produced by polyketide synthases) are valuable as therapeutic 
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agents. Polyketide synthases are multifunctional enzymes that catalyze the biosynthesis of a hugh variety of carbon 
chains differing in length and patterns of functionality and cyclization. Polyketide synthase genes fall into gene clusters 
and at least one type (designated type I) of polyketide synthases have large size genes and enzymes, complicating 
genetic manipulation and in vitro studies of these genes/proteins. 
5 [0037] The ability to select and combine desired components from a library ofpolyketides and postpolyketide bio- 
synthesis genes for generation of novel polyketides for study is appealing. The method(s) of the present invention 
make it possible to and facilitate the cloning of novel polyketide synthases, since one can generate gene banks with 
clones containing large inserts (especially when using the f-factor based vectors), which facilitates cloning of gene 
clusters. 

10 [0038] Preferably, the gene cluster DNA is ligated into a vector, particularly wherein a vector further comprises ex- 
pression regulatory sequences which can control and regulate the production of a detectable protein or protein-related 
array activity from the ligated gene clusters. Use of vectors which have an exceptionally large capacity for exogenous 
DNA introduction are particularly appropriate for use with such gene clusters and are described by way of example 
herein to include the f-factor (or fertility factor) ofE. coli. This Mactor of E. coli is a plasm id which affect high-frequency 

*5 transfer of itself during conjugation and is ideal to achieve and stably propagate large DNA fragments, such as gene 
clusters from mixed microbial samples. 

LIBRARY SCREENING: 

20 [0039] After normalized libraries have been generated, unique enzymatic activities can be discovered using a variety 
of solid- or liquid-phase screening assays in a variety of formats, including a high-throughput robotic format described 
herein. The normalization of the DNA used to construct the libraries is a key component in the process. Normalization 
will increase the representation of DNA from important organisms, including those represented in minor amounts in 
the sample. 

25 [0040] The following items also illustrate the invention: 

1. A process for producing a normalized genomic DNA library from an environmental sample, which comprises 
the steps of: 

30 (a) isolating a genomic DNA population from the environmental sample; 

(b) analyzing the complexity of the genomic DNA population so isolated; 

(c) at least one of the steps selected from the group consisting of (i) amplifying the copy number of the DNA 
population so isolated and (ii) recovering a fraction of the isolated genomic DNA having a desired characteristic; 
and 

35 (d) normalizing the representation of various DNAs within the genomic DNA population so as to form a nor- 

malized library of genomic DNA from the environmental sample. 

2. The process of item 1 which comprises the step of recovering a fraction of the isolated genomic DNA having a 
desired characteristic. 

40 

3. The process of item 1 which comprises the step of amplifying the copy number of the DNA population so isolated. 

4. The process of item 1 wherein the step of amplifying the genomic DNA precedes the normalizing step. 

45 5. The process of item 1 wherein the step of normalizing the genomic DNA precedes the amplifying step. 

6. The process of item 1 which comprises both the steps of (i) amplifying the copy number of the DNA population 
so isolated and (ii) recovering a fraction of the isolated genomic DNA having a desired characteristic. 

so 7. A normalized genomic DNA library formed from an environmental sample by a process comprising the steps of: 

(a) isolating a genomic DNA population from the environmental sample; 

(b) analyzing the complexity of the genomic DNA population so isolated; 

(c) at least one of (i) amplifying the copy number of the DNA population so isolated and (ii) recovering a fraction 
55 of the isolated genomic DNA having a desired characteristic; and 

(d) normalizing the representation of various DNAs within the genomic DNA population so as to form a nor- 
malized library of genomic DNA from the environmental sample. 
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8. The library of item 1 wherein the process of forming said library comprises the step of recovering a fraction of 
the isolated genomic DNA having a desired characteristic. 

9. The library of item 1 wherein the process of forming said library comprises the step of amplifying the copy number 
5 of the DNA population so isolated. 

10. The library of item 1 wherein in the process of forming said library the step of amplifying the genomic DNA 
precedes the normalizing step. 

10 11. The library of item 1 wherein in the process of forming said library the step of normalizing the genomic DNA 

precedes the amplifying step. 

12. The library of item 1 wherein the process of forming said library comprises both the steps of (I) amplifying the 
copy number of the DNA population so isolated and (ii) recovering a fraction of the isolated genomic DNA having 

*5 a desired characteristic. 

13. A process for forming a normalized library of genomic gene clusters from an environmental sample which 
comprises 

20 (a) isolating a genomic DNA population from the environmental sample; 

(b) analyzing the complexity of the genomic DNA population so isolated; 

(c) at least one of (i) amplifying the copy number of the DNA population so isolated and (ii) recovering a fraction 
of the isolated genomic DNA having a desired characteristic; and 

(d) normalizing the representation of various DNAs within the genomic DNA population so as to form a nor- 
25 malized library of genomic DNA from the environmental sample. 

14. A normalized library of genomic gene clusters formed from an environmental sample by a process comprising 
the steps of 

30 (a) isolating a genomic DNA population from the environmental sample; 

(b) analyzing the complexity of the genomic DNA population so isolated; 

(c) at least one of (I) amplifying the copy number of the DNA population so isolated and (ii) recovering a fraction 
of the isolated genomic DNA having a desired characteristic; and 

(d) normalizing the representation of various DNAs within the genomic DNA population so as to form a nor- 
35 malized library of genomic DNA from the environmental sample. 

Example 1 

DNA Isolation 

40 

[0041] 

1. Samples are resuspended directly in the following buffer: 

45 500mM Tris-HCI, pH 8.0 

100mM NaCI 
1mM sodium citrate 
100ug/m1 polyadenosine 
5mg/ml lysozyme 

50 

2. Incubate at 37°C for 1 hour with occasional agitation. 

3. Digest with 2mg/ml Proteinase K enzyme (Boehringer Mannheim) at 37°C for 30 min. 

4. Add 8 ml of lysis buffer [200 mM Tris-HCI, pH 8.0/100 mM NaCI/4% (wt/vol) SDS/10% (wt/vol) 4-aminosalicylate] 
and mix gently by inversion. 

55 5. Perform three cycles of freezing in a dry ice-ethanol bath and thawing in a 65 °C water bath to release nucleic 

acids. 

6. Extract the mixture with phenol and then phenol/chloroform/isoamyl alcohol. 

7. Add 4 grams of acid-washed polyvinylpolypyrrolidone (PVPP) to the aqueous phase and incubate 30 minutes 
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at 37°C to remove organic contamination. 

8. Pellet PVPP and filter the supernatant through a 0.45 urn membrane to remove residual PVPP. 

9. Precipitate nucleic acids with isopropyl alcohol. 

10. Resuspend pellet in 500 \i\ TE (10 mM Tris-HC1, pH 8.0/1.0 mM EDTA) 

5 11. Add 0.1 g of ammonium acetate and centrifuge mixture at 4 °C for 30 minutes. 

12. Precipitate nucleic acids with isopropanol. 

Example 2 

10 Bis-Benzimide Separation of DNA 

[0042] Sample composed of genomic DNA from Clostridium perfringens (27% G+C), Escherichia coli (49% G+C) 
and Micrococcus lysodictium (72% G+C) was purified on a cesium-chloride gradient. The cesium chloride (Rf = 1 .3980) 
solution was filtered through a 0.2 \im filter and 15 ml were loaded into a 35 ml OptiSeal tube (Beckman). The DNA 

15 was added and thoroughly mixed. Ten micrograms of bis-benzimide (Sigma; Hoechst 33258) were added and mixed 
thoroughly. The tube was then filled with the filtered cesium chloride solution and spun in a VTi50 rotor in a Beckman 
L8-70 Ultracentrifuge at 33,000 rpm for 72 hours. Following centrifugation, a syringe pump and fractionator (Brandel 
Model 1 86) were used to drive the gradient through an ISCO UA-5 UV absorbance detector set to 280 nm. Three peaks 
representing the DNA from the three organisms were obtained. PCR amplification of DNA encoding rRNA from a 1 0-fold 

20 dilution of the E. coli peak was performed with the following primers to amplify eubacterial sequences: 

Forward primer: (27F) 
25 5 '-AGAGTTTGATCCTGGCTCAG-3 ' 

Reverse primer: (1492R) 

30 

5'-GGTTACCTTGTTACGACTT-3' 



Example 3 

35 

Sample of DNA obtained from the gill tissue of a clam harboring an endosymbiont which cannot be physically 
separated from its host 

[0043] 

40 

1. Purify DNA on cesium chloride gradient according to published protocols (Sambrook, 1989). 

2. Prepare second cesium chloride solution; (Rf = 1.3980) filter through 0.2um filter and load 15ml into a 35ml 
OptiSeal tube (Beckman). 

3. Add 10^g bis-benzimide (Sigma; Hoechst 33258) and mix. 
45 4. Add 50u<j purified DNA and mix thoroughly. 

5. Spin in a VTi50 rotor in a Beckman L8-70 Ultracentrifuge at 33,000 rpm for 72 hours. 

6. Use syringe pump and fractionator (Brandel Model 1 86) to drive gradient through an ISCO UA-5 UV absorbance 
detector set to 280nm. 

50 Example 4 

Complexity Analysis 

[0044] 

55 

1. 16S rRNA analysis is used to analyze the complexity of the DNA recovered from environmental samples (Rey- 
senbach, 1992; DeLong, 1992; Barns, 1994) according to the protocol outlined in Example 1. 

2. Eubacterial sequences are amplified using the following primers: 
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Forward: 

5 AGAGTTTG ATCCTGGCTCAG-3 ' 
Reverse: 



5 '-GGTTACCTTGTTACGACTT-3 ' 

Archaeal sequences are amplified using the following primers: 

15 

Forward: 
5'- 

20 GCGGATCCGCGGCCGCTGCACA YCTGGTYGATYCTGCC-3 ' 

Reverse: 

5 '-GACGGGCGGTGTGTRCA-3 ' (R=purine,; Y- 
pyrimidine) 

30 3. Amplification reactions proceed as published. The reaction buffer used in the amplification of the archaeal se- 

quences includes 5% acetamide (Barns, 1994). 

4. The products of the amplification reactions are rendered blunt ended by incubation with Pfu DNA polymerase. 

5. Blunt end ligation into the pCR-Script plasmid in the presence of Srfl restriction endonuclease according to the 
manufacturer's protocol (Strategene Cloning Systems). 

35 6. Samples are sequenced using standard sequencing protocols (reference) and the number of different sequences 

present in the sample is determined. 

Example 5 

40 Normalization 

[0045] Purified DNA is fractionated according to the bis-benzimide protocol of Example (2), and recovered DNA is 
sheared or enzymatically digested to 3-6 kb fragments. Lone-linker primers are ligated and the DNA is sized selected. 
Size-selected DNA is amplified by PGR, if necessary. 
45 [0046] Normalization is then accomplished as follows: 

1. Double-stranded DNA sample is resuspended in hybridization buffer (0.12 M NaH 2 P0 4 , pH 6.8/0.82 M NaCt/l 
mM EDTA/0.1%SDS). 

2. Sample is overlaid with mineral oil and denatured by boiling for 10 minutes. 
50 3. Sample is incubated at 68°C for 12-36 hours. 

4. Double-stranded DNA is separated from single-stranded DNA according to standard protocols (Sambrook, 1989) 
on hydroxyapatite at 60°C. 

5. The single-stranded DNA fraction is desalted and amplified by PCR. 

6. The process is repeated for several more rounds (up to 5 or more). 
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Example 6 

Library Construction 
5 [0047] 

1. Genomic DNA dissolved in TE buffer is vigorously passed through a 25 gauge double-hubbed needle until the 
sheared fragments are in the desired size range. 

2. DNA ends are "polished" or blunted with Mung Bean nuclease. 

10 3. EcoRI restriction sites in the target DNA are protected with EcoRI methylase. 

4. EcoRI linkers [GGAATTCC) are ligated to the blunted/protected DNA using a very high molar ratio of linkers to 
target DNA. 

5. Linkers are cut back with EcoRI restriction endonuclease and the DNA is size fractionated using sucrose gra- 
dients. 

15 6. Target DNA is ligated to the XZAPII vector, packaged using in vitro lambda packing extracts, and grown in the 

appropriate E. coii XLI Blue host cell. 

Example 7 

20 Library Screening 

[0048] The following is a representative example of a procedure for screening an expression library prepared in 
accordance with Example 6. 

[0049] The general procedures for testing for various chemical characteristics is generally applicable to substrates 

25 other than those specifically referred to in this Example. 

[0050] Screening for Activity. Plates of the library prepared as described in Example 6 are used to multiply inoculate 
a single plate containing 200 \iL of LB Amp/Meth, glycerol in each well. This step is performed using the High Density 
Replicating Tool (HDRT) of the Beckman Biomek with a 1 % bleach, water, isopropanol, air-dry sterilization cycle 
between each inoculation. The single plate is grown for 2h at 37°C and is then used to inoculate two white 96-well 

30 Dynatech microtiter daughter plates containing 250 uL ofLB Amp/Meth, glycerol in each well. The original single plate 
is incubated at 37°C for 18h, then stored at -80°C. The two condensed daughter plates are incubated at 37°C also for 
18 h. The condensed daughter plates are then heated at 70°C for 45 min. to kill the cells and inactivate the host E.coli 
enzymes. A stock solution of 5mg/mL morphourea phenylalanyl-7-amino-4-trifluoromethyl coumarin (MuPheAFC, the 
'substrate') in DMSO is diluted to 600 u.M with 50 mM pH 7.5 Hepes buffer containing 0.6 mg/mL of the detergent 

35 dodecyl maltoside. 

MuPheAFC 

[0051] Fifty \il of the 600 uM MuPheAFC solution is added to each of the wells of the white condensed plates with 
40 one 1 00 u.L mix cycle using the Biomek to yield a final concentration of substrate of - 1 00 jiM . The fluorescence values 
are recorded (excitation = 400 nm, emission = 505 nm) on a plate reading fluorometer immediately after addition of 
the substrate (t=0). The plate is incubated at 70°C for 100 min, then allowed to cool to ambient temperature for 15 
additional minutes. The fluorescence values are recorded again (t=100). The values at t=0 are subtracted from the 
values at t=1 00 to determine if an active clone is present. 
45 [0052] The data will indicate whether one of the clones in a particular well is hydrolyzing the substrate. In order to 
determine the individual clone which carries the activity, the source library plates are thawed and the individual clones 
are used to singly inoculate a new plate containing LB Amp/Meth, glycerol. As above, the plate is incubated at 37°C 
to grow the cells, heated at 70°C to inactivate the host enzymes, and 50 u,L of 600 uM MuPheAFC is added using the 
Biomek. Additionally three other substrates are tested. They are methyl umbelliferone heptanoate, the CBZ-arginine 
50 rhodamine derivative, and fluorescein-conjugated casein (-3.2 mol fluorescein per mol of casein). 

[0053] The umbelliferone and rhodamine are added as 600 uM stock solutions in 50 uL of Hepes buffer. The fluo- 
rescein conjugated casein is also added in 50 jiL at a stock concentration of 20 and 200 mg/mL. After addition of the 
substrates the t=0 fluorescence values are recorded, the plate is incubated at 70°C, and the t=100 min. values are 
recorded as above. 

55 [0054] These data indicate which plate the active clone is in, where the arginine rhodamine derivative is also turned 
over by this activity, but the lipase substrate, methyl umbelliferone heptanoate, and protein, fluorescein-conjugated 
casein, do not function as substrates. 

[0055] Chiral amino esters may be determined using at least the following substrates: For each substrate which is 
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turned over the enantioselectivity value, E, is determined according to the equation below: 

ln[(1-c(1+ee p )] 

5 ~ ln[(1-c(1-ee p )] 

where eep = the enantiomeric excess (ee) of the hydrolyzed product and c = the percent conversion of the reaction. 
See Wong and Whitesides, Enzymes in Synthetic Organic Chemistry, 1994, Elsevier, Tarrytown, New York, pp. 9-12. 
[0056] The enantiomeric excess is determined by either chiral high performance liquid chromatography (HPLC) or 
10 chiral capillary electrophoresis (CE). Assays are performed as follows: two hundred ul of the appropriate buffer is 
added to each well of a 96-well white microtiter plate, followed by 50 uL of partially or completely purified enzyme 
solution; 50 uL of substrate is added and the increase in fluorescence monitored versus time until 50% of the substrate 
is consumed or the reaction stops, whichever comes first. 

15 Example 8 

Construction of a Stable, Large Insert Picoplankton Genomic DNA Library 

[0057] Cell collection and preparation of DNA. Agarose plugs containing concentrated picoplankton cells were 
20 prepared from samples collected on an oceanographic cruise from Newport, Oregon to Honolulu, Hawaii. Seawater 
(30 liters) was collected in Niskin bottles, screened through 10 |im Nitex, and concentrated by hollow fiber filtration 
(Amicon DC 10) through 30,000 MW cutoff polyfulfone filters. The concentrated bacterioplankton cells were collected 
on a 0.22 urn, 47 mm Durapore filter, and resuspended in 1 ml of 2X STE buffer (1M NaCI, 0.1 M EDTA, 10 mM Tris, 
pH 8.0) to a final density of approximately 1 x 10 10 cells per ml. The cell suspension was mixed with one volume of 
25 1 % molten Seaplaque LMP agarose (FMC) cooled to 40°C, and then immediately drawn into a1 ml syringe. The syringe 
was sealed with parafilm and placed on ice for 10 min. The cell-containing agarose plug was extruded into 10 ml of 
Lysis Buffer (10mM Tris pH 8.0, 50 mM NaCI, 0.1M EDTA, 1% Sarkosyl, 0.2% sodium deoxycholate, 1 mg/ml lysozyme) 
and incubated at 37°C for one hour. The agarose plug was then transferred to 40 mis of ESP Buffer (1 % Sarkosyl, 1 
mg/ml proteinase K, in 0.5M EDTA), and incubated at 55°C for 16 hours. The solution was decanted and replaced with 
30 fresh ESP Buffer, and incubated at 55°C for an additional hour. The agarose plugs were then placed in 50 mM EDTA 
and stored at 4°C shipboard for the duration of the oceanographic cruise. 

[0058] One slice of an agarose plug (72 uJ) prepared from a sample collected off the Oregon coast was dialyzed 
overnight at 4°C against 1 mL of buffer A (100mM NaCI, 10mM Bis Tris Propane-HCI, 100 ng/ml acetylated BSA: pH 
7.0 @ 25°C) in a 2 mL microcentrifuge tube. The solution was replaced with 250 ul of fresh buffer A containing 10 mM 

35 MgCI 2 and 1 mM DTT and incubated on a rocking platform for 1 hr at room temperature. The solution was then changed 
to 250 jxl of the same buffer containing 4U of Sau3A1 (NEB), equilibrated to 37°C in a water bath, and then incubated 
on a rocking platform in a 3 7 °C incubator for 45 min. The plug was transferred to a 1 .5 ml microcentrifuge tube and 
incubated at 68°C for 30 min to inactivate the enzyme and to melt the agarose. The agarose was digested and the 
DNA dephosphorylased using Gelase and HK-phosphatase (Epicentre), respectively, according to the manufacturer's 

40 recommendations. Protein was removed by gentle phenol/chloroform extraction and the DNA was ethanol precipitated, 
pelleted, and then washed with 70% ethanol. This partially digested DNA was resuspended in sterile H 2 0 to a con- 
centration of 2.5 ng/ul for ligation to the pFOS 1 vector. 

[0059] PCR amplification results from several of the agarose plugs (data not shown) indicated the presence of sig- 
nificant amounts of archaeal DNA. Quantitative hybridization experiments using rRNA extracted from one sample, 

45 collected at 200 m of depth off the Oregon Coast, indicated that planktonic archaea in (this assemblage comprised 
approximately 4.7% of the total picoplankton biomass (this sample corresponds to K PACI"-200 m in Table 1 of DeLong 
et ai, high abundance of Archaea in Antarctic marine picoplankton, Nature, 371 :695-698, 1 994). Results from archaeal- 
biased rDNA PCR amplification performed on agarose plug lysates confirmed the presence of relatively large amounts 
of archaeal DNA in this sample. Agarose plugs prepared from this picoplankton sample were chosen for subsequent 

50 fosmid library preparation. Each 1 ml agarose plug from this site contained approximately 7.5 x 10 5 cells, therefore 
approximately 5.4 x 10 5 cells were present in the 72 ul slice used in the preparation of the partially digested DNA. 
[0060] Vector arms were prepared from pFOS1 as described (Kim et ai, Stable propagation of casmid sized human 
DNA inserts in an F factor based vector, Nucl. Acids Res., 20:10832-10835, 1992). Briefly, the plasmid was completely 
digested with Astll, dephosphorylated with HK phosphatase, and then digested with BamHI to generate two arms, 

55 each of which contained a cos site in the proper orientation for cloning and packaging ligated DNA between 35-45 kbp. 
The partially digested picoplankton DNA was ligated overnight to the PFOS1 arms in a 15uJ ligation reaction containing 
25 ng each of vector and insert and 1U of T4 DNA ligase (Boehringer-Mannheim). The ligated DNA in four microliters 
of this reaction was in vitro packaged using the Gigapack XL packaging system (Stratagene), the fosmid particles 
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transfected to E. coli strain DH10B (BRL), and the cells spread onto LB cm15 plates. The resultant fosmid clones were 
picked into 96-well microliter dishes containing LB cml5 supplemented with 7% glycerol. Recombinant fosmids, each 
containing ca. 40 kb of picoplankton DNA insert, yielded a library of 3.552 fosmid clones, containing approximately 1 .4 
x 10 8 base pairs of cloned DNA. All of the clones examined contained inserts ranging from 38 to 42 kbp. This library 
5 was stored frozen at -80°C for later analysis. 

[0061] Numerous modifications and variations of the present invention are possible in light of the above teachings; 
therefore, within the scope of the claims, the invention may be practiced other than as particularly described. 
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TABLE 1 



10 



15 



A2 

Fluorescein conjuf aicd coscin (3.2 mol fluoresce in/m«» I casein) 

CBZ-AJa-AMC 

l-BOC-Ala-Ali-Afp-AMC 

cuccinyl-Aia-Cly*l.cu»AMC 

CBZ-A/g-AMC 

CBZ-Mct-AMC 

oiocphourea-Phc-AMC 

l-BOC s i-butoxy carbonyl, CBZ = carbonyl benzykuy. 
AMC = 7-amino-4-mcihyl coumarin 



AA3 



20 



25 



30 



nh, nh» 



AB3 

KN NH HN NH 

•o o 



AD3 

Fluorescein conjugated casein 

i-BOC- Ala-Ala-Afp-AFC 
CBZ- Ala-AIa-Lys-AFC 
succinyl-Ala-Aia-Phc-AFC 
cuccinyl-Ata-Gly-Leu-AFC 

AFC «= 7-amino-4-tdnooromelhyl coumarin.) 



AC3 



V 

o 



o 



35 



40 



AE3 



Fluorescein conjug ated 
casein 



AF3 

l-BOC- Aia-Ali-Asp-AFC 
CBZ-Asp-AFC 



AH3 

sucdnyl-Ala-Ala-l^he-AFC 

CBZ-Phc-AFC 

CBZTrp-AFC 



AI3 



45 



AG3 

CBZ- AU-AJa-Lyt «AFC 
CBZ-Aig-AFC 



tuccinyl-Ala ( tly-l.cu-AK' 

CBZ-Ala-AK: 

C07.-Sewr.AK a 
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L2 



CH 3 CH, Qi, CH, £ 



LA3 LB 3 



a* 



LD3 



And all of L2 cis 



LE3 

LG3 
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TABLE 3 

LI 1 3 



And aIlofL2 



U3 



CH, 



CH, 

LK3 LL3 




LN3 

L03 
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TABLE 4 



5 




4-methyi umbelliferone 
wherein R= 



/J-D-galactose 

0-D-glucose 

0-D-glucuronide 

/3-D-ccllotrioside 

0-B-cellobiopyianoside 

0-D-galactose 

a-D-galactose 

0-D-glucose 

a-D-glucose 

jS-D-glucuronide 

)5-D-N,N-diacetylchitobiose 

0-D-fucose 

cr-L-fucose 

0-L-fucose 

/3-D-mannose 

a-D-mannose 



G2 



20 



25 



30 



GB3 

GC3 

GD3 

GE3 

GI3 

GJ3 



GK3 



35 



non-Umbelliferyl substrates 

GA3 amylose [polyglucan al t 4 linkages], amylopeain 

[poiygiucan branching a 1,6 linkages] 
GF3 xyian [poly 1,4-D-xylan] 

GG3 amylopeain, pulluian 

GH3 sucrose, fruciofuranoside 



50 
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SEQUENCE LISTING 

(1)* GENERAL INFORMATION 

(i) APPLICANT: Recombinant Biocatalysis, Inc. 



(ii) TITLE OF INVENTION: PRODUCTION AND USE OF NORMALIZED DNA 
0 LIBRARIES 

(iii) NUMBER OF SEQUENCES: 10 

(iv) CORRESPONDENCE ADDRESS: 

15 (A) ADDRESSEE: FISH & RICHARDSON 

(B) STREET: 4225 EXECUTIVE SQUARE, STE. 1400 

(C) CITY: LA JOL1A 

(D) STATE: CA 

(E) COUNTRY : USA 
20 (F) ZIP: 92037 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 3.5 INCH DISKETTE 

(B) COMPUTER: IBM PS/2 

25 (C) OPERATING SYSTEM: MS-DOS 

(D) SOFTWARE: WORD PERFECT 6.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: Unassigned 

(B) FILING DATE: 18 June 1997 
30 (C) CLASSIFICATION: Unassigned 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/665,565 

(B) FILING DATE: 18 June 1996 

(C) CLASSIFICATION : 

35 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: LISA A. HAILE, Ph.D. 

(B) REGISTRATION NUMBER: 38,347 

(C) REFERENCE /DOCKET NUMBER: 09010/019WO1 

40 (ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 619-678-5070 

(B) TELEFAX: 619-678-5099 
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(2) 



INFORMATION FOR SEQ ID NO:l: 



(i) 



SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) 



MOLECULE TYPE: 



CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGATTGAA GACCCTATGG AC 



(2) INFORMATION FOR SEQ. ID NO: 2: 

(l) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) . SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
CGGAAGATCT TTAAGCACTT CTCTCAGGTT C 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGGACAGG CTTGAAAAAG TA 



(2) INFORMATION FOR SEQ ID NO : 4: 

(i) ' SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CGGAAGATCT. TCAGCTAAGC TTCTCTAAGA A 
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(2) INFORMATION FOR * SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

( C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 
CCGACAATTG ATTAAAGAGG AGAAATTAAC TATGTGGGAA TTAGACCCTA AA 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CGGAGGATCC CTACACCTGT TTTTCAAGCT C 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
• (D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
CCGACAATTG ATTAAAGAGG AGAAATTAAC TATGACATAC TTAATGAACA AT 



(2) INFORMATION FOR SEQ ID NO: 8: . 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CGGAAGATCT TTATGAGAAG TCCCTTTCAA G 
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(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE : NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGCGGAAA CTGGCCGAGC GG 



(2) * INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS 
(A) LENGTH: 31 NUCLEOTIDES 
<B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 
CGGAGGATCC TTAAAGTGCC GCTTCGATCA A 



Claims 

1. A method for forming a normalized DNA library from a mixed population of organisms, which comprises: 

(a) obtaining a DNA population from the mixed population of organisms; 

(b) at least one of the steps selected from the group consisting of (i) amplifying the copy number of the DNA 
population so isolated and (ii) recovering a fraction of the isolated DNA having a desired characteristic; and 

(c) normalizing the representation of various DNAs within the DNA population so as to form a normalized 
library of DNA from the mixed population of organisms. 

2. The method of claim 1, further comprising prior to (b) fractionating the DNA population by contacting the DNA 
population with an intercalating agent and separating the DNA. 

3. The method of claim 2, wherein the intercalating agent is bis-benzimide. 

4. The method of claim 1, which comprises recovering a fraction of the isolated DNA having a desired characteristic. 

5. The method of claim 1, which comprises amplifying the copy number of the DNA population so isolated. 

6. The method of claim 1, wherein the amplifying the DNA precedes normalizing. 

7. The method of claim 1, wherein normalizing the DNA precedes amplifying. 

8. The method of claim 1, which comprises both the steps of (i) amplifying the copy number of the DNA population 
so isolated and (ii) recovering a fraction of the isolated DNA having a desired characteristic. 

9. A normalized DNA library formed from a mixed population of organisms by a method comprising: 
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10 



15 



20 



35 



40 



(a) obtaining a DNA population from the mixed population of organisms; 

(b) at least one of (i) amplifying the copy number of the DNA population so isolated and (ii) recovering a fraction 
of the isolated DNA having a desired characteristic; 

(c) normalizing the representation of various DNAs within the DNA population; and 

(d) transforming host cells with the DNA of (c) so as to form a normalized library of DNA from the mixed 
population of organisms. 

10. The library of claim 9, further comprising prior to (b) fractionating the DNA population by contacting the DNA 
population with an intercalating agent and separating the DNA. 

11. The library of claim 10, wherein the intercalating agent is bis-benzimide. 

12. The library of claim 9, wherein the method of forming said library comprises recovering a fraction of the isolated 
DNA having a desired characteristic. 

13. The library of claim 9, wherein the method of forming said library comprises amplifying the copy number of the 
DNA population so isolated. 

14. The library of claim 9, wherein in the method of forming said library amplifying the DNA precedes normalizing. 

15. The library of claim 9, wherein in the method of forming said library normalizing the DNA precedes amplifying. 



16. The library of claim 9, wherein the method of forming said library comprises both the steps of (i) amplifying the 5 
copy number of the DNA population so isolated and (ii) recovering a fraction of the isolated DNA having a desired 

25 characteristic. 

17. A method for producing a normalized library of gene clusters from a mixed population of organisms, which com- 
prises: 

30 (a) obtaining a DNA population from the mixed population of organisms; 

(b) at least one of (i) amplifying the copy number of the DNA population so isolated and (ii) recovering a fraction 
of the isolated DNA having a desired characteristic; and 

(c) normalizing the representation of various DNAs within the DNA population so as to produce a normalized 
library of DNA from the mixed population of organisms. 



18. The method of claim 17, further comprising prior to (b) fractionating the DNA population by contacting the DNA 
population with an intercalating agent and separating the DNA. 

19. A normalized library of gene clusters formed from a mixed population of organisms by a method comprising: 



(a) obtaining a DNA population from the mixed population of organisms; 

(b) at least one of (i) amplifying the copy number of the DNA population so isolated and (ii) recovering a fraction 
of the isolated DNA having a desired characteristic; and 

(c) normalizing the representation of various DNAs within the DNA population so as to form a normalized 
45 library of DNA from the mixed population of organisms. 



50 



55 



21 



EP 1 528 067 A2 



0.7 
0.6 

<°.5 
Q 0.4 

eg 

£0.3 

02 
0.1 
0 



~l — I — 1 1 1 K I ' ' 1 ' | » -i — 1 1 1 1 ■ i — r 1 1 



— e- 


-pacscum 


— B- 


-sqsymbnt 
-wnale801 





-Control I 



20 30 40 50 60 70 



% G+C 



Figure 1 



22 



(19) 



J 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



(11) 



EP 1 528 067 A3 



(12) 



EUROPEAN PATENT APPLICATION 



(88) 


Date of publication A3: 


(51) IntCI.: 




01.02.2006 Bulletin 2006/05 


C07H 21/02 1* 00 *- 01 ) C07H 21/04 P 006 01 ) 






CUD 1Q/7A (2006.01) 1/fitt (2000.01) 


(43) 


Date of publication A2: 


C12N 15/10 




04.05.2005 Bulletin 2005/18 




(21) 


Application number: 04024843.7 




(22) 


Date of filing: 18.06.1997 




(84) 


Designated Contracting States: 


(72) Inventors: 




AT BE CH DE DK ES Fl FR GB GR IE IT LI LU MC 


• Short, Jay, M. 




NL PT SE 


Del Mar, CA 92014 (US) 






• Mathur, Eric, J. 


(30) 


Priority: 18.06.1996 US 665565 


Carlsbad, CA 92009 (US) 


(62) 


Document number(s) of the earlier application(s) in 


(74) Representative: Vossius & Partner 




accordance with Art. 76 EPC: 


Siebertstrasse 4 




97930172.8 / 0 923 598 


81675 Munchen (DE) 


(71) 


Applicant: DIVERSA CORPORATION 






San Diego, CA 92121 (US) 





(54) Production and use of normalized DNA libraries 



(57) Disclosed is a process for forming a normalized 
genomic DNA library from an environmental sample by 
(a) isolating a genomic DNA population from the envi- 
ronmental sample; (b) analyzing the complexity of the 
genomic DNA population so isolated; (c) at least one of 
(i) amplifying the copy number of the DNA population so 
isolated and (ii) recovering a fraction of the isolated ge- 



nomic DNA having a desired characteristic; and (d) nor- 
malizing the representation of various DNAs within the 
genomic DNA population so as to form a normalized li- 
brary of genomic DNA from the environmental sample. 
Also disclosed is a normalized genomic DNA library 
formed from an environmental sample by the process. 



CO 
< 

co 
o 

00 
CM 

in 



CL 

LU 



Printed by Jouve. 75001 PARIS (FR) 



EP 1 528 067 A3 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 04 02 4843 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document with indication, where appropriate, 
of relevant passages 



Relevant 
to claim 



CLASSIFICATION OF THE 
APPLICATION (IPC) 



D,A 



PATANJALI S R ET AL: "CONSTRUCTION OF A 
UNIFORM-ABUNDANCE (NORMALIZED) CDNA 
LIBRARY" 

PROCEEDINGS OF THE NATIONAL ACADEMY OF 
SCIENCES OF USA, US, NATIONAL ACADEMY OF 
SCIENCE. WASHINGTON, 

vol. 88, no. 5, 1 March 1991 (1991-03-01), 

pages 1943-1947, XP000368687 

ISSN: 0027-8424 

* the whole document * 



1-19 



C07H21/02 

C07H21/04 

C12P19/34 

C12Q1/68 

C12N15/10 



STEIN J L ET AL: "Characterization of 
uncultivated prokaryotes: isolation and 
analysis of a 40-kilobase-pair genome 
fragment from a planktonic marine 
archaeon. " 

JOURNAL OF BACTERIOLOGY, (1996 FEB) 178 
(3) 591-9., February 1996 (1996-02), 
XP002050143 

* the whole document * 



1-19 



W0 95/08647 A (UNIV COLUMBIA ;S0ARES 
MARCELO B (US); EFSTRATIADIS ARGIRIS (US)) 
30 March 1995 (1995-03-30) 
* page 5, line 1 - line 35; claims * 



SOARES M.B. ET AL. : "Construction and 
characterization of a normalized cDNA 
library" 

PROCEEDINGS NATIONAL ACADEMY OF SCIENCES, 
vol. 91, September 1994 (1994-09), pages 
9228-9232, XP002138140 
USA 

* the whole document * 



1-19 



1-19 



TECHNICAL FIELDS 
SEARCHED (IPC) 



C12N 
C12Q 



The present search report has been drawn up for all claims 



Munich 



Dale of completion of the search 

28 November 2005 



Examiner 

Luzzatto, E 



CATEGORY OF CITED DOCUMENTS 

X : particularly relevant if taken alone 

Y : particularly relevant rf combined with another 

document of the same category 
A : techno log ical background 
O : non-written disclosure 
P : intermediate document 



T : theory or principle underlying the invention 
E : earlier patent document, but published on, or 

ofter the filing date 
D : document otted in the application 
L : document cited for other reasons 

& : member of the same patent family, corresponding 
document 



2 



EP 1 528 067 A3 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 04 02 4843 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document with indication, where appropriate, 
of relevant passages 



Relevant 
to claim 



CLASSIFICATION OF THE 
APPLICATION (IPC) 



SIMPSON L: "Isolation of maxicircle 
component of kinetoplast DNA from 
hemoflagellate protozoa." 
PROCEEDINGS OF THE NATIONAL ACADEMY OF 
SCIENCES OF THE UNITED STATES OF AMERICA. 
APR 1979 

vol. 76, 'no. 4, April 1979 (1979-04), 

pages 1585-1588, XP002356390 

ISSN: 0027-8424 

* the whole document * 



2,3,10, 
11,18 



FROM 



ALDRICH J ET AL: "ISOLATION AND 
CHARACTERIZATION OF CHLOROPLAST DNA 
THE MARINE CHR0M0PHYTE 
OLISTHODISCUS-LUTEUS ELECTRON MICROSCOPIC 
VISUALIZATION OF ISOMERIC MOLECULAR FORMS" 
PLANT PHYSIOLOGY (ROCKVILLE), 
vol. 68, no. 3, 1981, pages 641-647, 
XP002356391 
ISSN: 0032-0889 
* the whole document * 



2,3,10, 
11,18 



TECHNICAL FIELDS 
SEARCHED (IPC) 



The present search report has been drawn up for all claims 



Place oi search 

Munich 



Date oi completion of the search 

28 November 2005 



Examiner 

Luzzatto, E 



CATEGORY OF CITED DOCUMENTS 

X : particularly relevant if taken alone 

Y : particularly relevant if oombined with another 

document of the same category 
A : technological background 
O : non-written disclosure 
P : intermediate document 



T : theory or principle underlying the invention 
E : earlier patent document, but published on, or 

after the filing date 
D : document cited in the application 
L : document cited for other reasons 

& : member of the same patent family, corresponding 
document 



3 



EP 1 528 067 A3 



ANNEX TO THE EUROPEAN SEARCH REPORT 
ON EUROPEAN PATENT APPLICATION NO. 



EP 04 02 4843 



This annex lists the patent family members relating to the patent documents cited in the above-mentioned European search report. 
The members are as contained in the European Patent Office EDP file on 

The European Patent Office is in no way liable for these particulars which are merely given for the purpose of information. 

28-11-2005 



Patent document 
cited in search report 


Publication 
date 


Patent family 
member(s) 


Publication 
date 


WO 9508647 A 


30-03-1995 


AU 


7842594 A 


10-04-1995 






us 


5482845 A 


09-01-1996 






us 


5637685 A 


10-06-1997 






us 


5830662 A 


03-11-1998 



i For more details about this annex : see Official Journal of the European Patent Office, No. 12/82 



4 



